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Abstract — Assume that a Gaussian process £, is predicted from n pointwise 
observations by intrinsic Kriging and that the volume of the excursion set of ^ 
above a given threshold u is approximated by the volume of the predictor. The 
first part of this paper gives a bound on the convergence rate of the approximated 
volume. The second part describes an algorithm that constructs a sequence of 
points to yield a fast convergence of the approximation. The estimation of the 
volume of an excursion set is a highly relevant problem for the industrial world 
since it corresponds to the estimation of the failure probability of a system that 
is known only through sampled observations. 
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1 Introduction 

The problem to be considered in this paper is the estimation of the probability 

Vu ■■= P{/(A) > u}, (1) 

where /(x) is a real function defined over an arbitrary set X (X = [0, 1]'' or 
X — W^, in most situations) endowed with a probability measure fi and X E X 
is a random vector with the distribution /i. In practice, the estimation of ^ is 
based on a finite sequence of evaluations of / at points {xi)i<i<n in X. Another 
way of looking at HI) is via the excursion set 

AM) e X : fix) > u} (2) 

of the function / above the level u, since P{f{X) > u} is the volume ^i{Au{f)), 
hereafter denoted by 



Such a problem is frequently encountered in engineering: the probability 
that the inputs of the system will generate a level of a function of the outputs 
that exceeds a specified reference level may be expressed as ^ (where in this 
case, X is the vector of the inputs of the system and / is a statistic of the 
outputs) . Since to obtain the value of / at a given x may be very expensive in 
practice, because it may involve heavy computer codes for instance, it is often 
essential to estimate Vu using as few evaluations of / as possible. 

To overcome the problem of evaluating / many times, one possible approach 
is to estimate |^u(/n)| instead of |^„(/)|, where /„ is an approximation of / 
constructed from a small set {/(xi), . . . , /(a;„)} of pointwise evaluations. Such 
an approximation can be obtained by assuming that / is a sample path of a 
Gaussian random process ^ and by using a linear predict or of £ constru cted 
from ^{xi), z = 1, . . . ,n. In this paper, intrinsic Kriging (jMatheronl . will 
be used to obtain We shall show in Section [2] that this method is Hkely to 
give faster convergences than the classical Monte Carlo estimators, depending 
on the regularity of 

A second step is to choose a sequence of evaluation points (xi) so that 
|^u(Cn)| — conditioned on the random variables ^{xi), i < n, converges 

rapidly to zero. Section[3]presents an acceleration algorithm based on computing 
an upper bound of the mean square error of volume approximation conditioned 
on the events {£,{xi) = f{xi), i = l...,n}: a point Xn+i is selected so that 
evaluating /(x„+i) yields the potential largest decrease of the upper bound. 
Section [4] provides a numerical example. 



2 Excursion set volume estimation by intrinsic 
Kriging 

This section deals with the estimation of the probability Vu from observations 
of / at a finite sequence of points {xi)i<i<n- As mentioned above, Vu is the 
volume of Au{f) under the probability distribution ^. We assume moreover 
that / is a sample path of a (separable) Gaussian process ^, with mean m{x), 
a; S X, and covariance k{x,y), {x,y) E X^. 



2.1 Monte Carlo estimation 

Monte Carlo is a commonly used method to estimate |^u(^)|. The volume of 
excursion of a Gaussian process ^ may be estimated by 

1 ' 

\AuiO\i jJ2Miix.)>u} \Aum a.s. (3) 

1=1 

where the XiS are independent random variables with distribution fi. The esti- 
mator ((HI) is unbiased, since i?[|A„(f)|; | ^] = |A„(^)|, and 

E [i\Aumi \Aumf I e] = ]\Aum^ \mo\) ■ 

If evaluating / (a sample path of ^) at many points of X is not particularly 
demanding, then estimating |^,((/)| is straightforward. However, if |^„(/)| 
is small, then the variance of the Monte Carlo estimator is approximately 



2 



\Au{f)\/l. To achieve a given standard deviation K\Au{f)\, with k > small, 
the required number of evaluations is approximately i.e. it is 

high. Thus, the convergence of ^ may be too slow in many real applications 
where doing a lot of evaluations of / may not be affordable (for instance, / 
may be a complex computer simulation and may take hours or days to run) . Of 
course, many other methods have been proposed to improve the basic Monte 
Carlo converg ence. For instanc e, methods based on importance sampling, on 
cross-entrop y iRubinsteinl. 1999h. on the classical extreme value theorv fe.g. Em- 



brechts et al." 119971 ) . etc. They are not considered here for the sake of brevity. 

2.2 Estimation based on an approximation 

An alternative approach is to replace / by an approximation /„ constructed 
from a set of n point evaluations of /. Provided /„ converges rapidly enough 
to /, one expects a good estimation of the excursion sets and their volume 
using only a few evaluations of /. There are many ways of constructing such an 
approximation. Let us mention two classical methods: regularized regressions 
in reproduci ng kernel Hilbert spaces, e.g. sphnes or radial basis functions (see 
for instance IWendlandl . [20Q5f ). an d hnear prediction of rand om processes, also 



known as Kriging (see for instance Chiles and Delfiner . 19991 ). In this paper, we 



shall adopt the probabilistic frameworl0. 



Thus, let us consider that an unbiased linear estimator ^„ of ^ has been 
obtained from £,{xi), . . . , ^(x„). In particular, we can use ordinary Kriging when 
the mean of £,{x) is known and intrinsic Kriging when it is unknown, which is 
more often the case. 

Can we expect a faster convergence when ^ is replaced by Here, we 
assume the computation time to evaluate ^nix), x e X, conditioned on £,{xi) = 
f{xi), i = 1, . . . , n, is small, which means that we can make |A„(^„)|; — |A„(^„)| 
neghgible with respect to |A„(^„)| — |A„(^)|. Thus, we are now interested in 
the convergence of |yl„(^„)| to |Au(^)|. Section [2 . 2 . 21 shows how the convergence 
rate in mean square of |A„(^„)| to |^i,(^)| depends on the fill distance of X and 
the regularity of ^. In Section [3l we shall propose an algorithm to speed up this 
rate by a sequential choice of the evaluation points. 

2.2.1 Intrinsic Kriging basics 

In this paper, we use intrinsic Kriging (IK) to obtain a linear predictor of ^ based 
on a fin ite set of pointwi se observations of the process. We recall here the main 
results (Matheron, IK extends linear prediction when the mean of £,{x) 



is unknown but can be written as a linear parametric function m(x) = b^p{x). 
Here, p{x) is a g-dimensional vector of base functions of a vector space M of 
translation-stable functions (in practice, all polynomials of degree less or equal 
to I) and 6 is a vector of unknown parameters. Intrinsic Kriging assumes that 
observed values of / are samples from a representation of an intrinsic random 
function (IRF) , a generalized random process defined over a space A; of measures 
orthogonal to J\f, and characterized by its stationary generalized covariance k{h) 
(see the Appendix Section for more details). 



^In fact, thes e two classes of methods, whi ch have been studied separately, are equivalent 
(see for instance iKimeldorf and Wahbal ijigTOf )*). 
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Proposition 1 (Intrinsic Kriging, MatheronI 19731 ). Let he an IRF{1), with 
generalized covariance k{h). Assume n observations be sample values of the 
random variables = £,{xi) + Ni, i = 1, . . . ,n, where ^ is an unknown rep- 
resentation of and the NiS are zero-mean random variables independent of 
^ (x) , with covariance matrix Kj\j . 

The intrinsic Kriging predictor of^{x) based on the observations, is the linear 
projection £,n{x) = J2i ^hx£,xY of £,{x) onto Hs = span{^°^^ i = 1, . . . , n}, such 
that the variance of the prediction error ^(x) — S,nix) is minimized under the 
constraint Sx — ^i,xSxi € A;. The coefficients Xi,x, i = I, . . . ,n, are solutions 
of a system of linear equations, which can be written in matrix form as 



K + Kn P'^ \f \x 

p M M 



kx 

Px 



(4) 



where K is the nx n matrix of generalized covariances k{xi ~ xj) , P is a q x n 
matrix with entries a;/ for j = 1, . . . ,n and multi-indexes i = {ii, . . . ,id) such 
that \i\ := ii -\- ■ ■ ■ -\- id < I , ^J, is a vector of Lagrange coefficients, kx is a vector 
of size n with entries k{x — Xi) and Px is a vector of size q with entries x\ i 
such that \i\ < I. 

The variance of the prediction error is given by an{x)^ :— var[^(a;)— = 



A^, kx 



fJ-^Px- 



m 

Proof See lMatheroiil (|l973l ). 



□ 



2.2.2 Asymptotics 

In this section, we shall justify that modehng the unknown / by a Gaussian 
random process ^ and estimating by |^«(Cn)| is well-founded. Our ob- 

jective is to establish a mean square convergence when the evaluation points fill 



Cl assical results in approxi n iation theory (see for in s tance Wu and Schabacld . 

19931 : lLight and Wavnel . ll998l : [Narcowich et aP . l2Q03l : IWendlandl . l2Q05l ) assert 
that the variance cr^j (x) of the IK prediction error at x decreases as the sampHng 
density or the regularity of the covariance increases. More precisely, if X is a 
bounded domain of R"*, and the Fourier transform of k{h), h E R'^, satisfies 



with I' > d/2, then 



1 +11^11^)-^ <fc(^)<C2(l+ ll^ll)- 



ik„(.)iioo<c/ir'/', 



(5) 



where /i„ = sup^^gx miniHy — Xi\\2 is a fill distance of (xi, . . . , x„) in X. 

The following theorem shows that a similar result holds for the process 
thresholded at a level u. 

Theorem 1. Let ^ be an unknown representation of an IRFfl ) £,g, O'fid in{x) be 
the IK predictor of based on observations ^{xt), i — 1, . . . ,n. Define CTn(x) :— 
var[^{x)~^r.{x)f^. Then, 



E 



(le(j;)>« -~ lc„(x)>«)^ = 0(cr„(a;)|log(a„(x))|^/^) when o-„(x) -> . 
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Proof. For all a; € X, ^{x) — (,n{x) is Gaussian with zero-mean and variance 
(T„(.x)^ (but is not orthogonal to S,n{x), as would be the case if the mean of ^ 
were known). Thus, Vx e X and Vn e N, we can write ^(x) as 



£,{x) = (1 + a„(x))^„(x) + hn{x) + Cn(a;) , 



(6) 



where a„(x),6„(x) G M, Cn(2;) is Gaussian and such that E[^„ (x)C„(.x)] = and 
E[i^„(x)] = 0. This decomposition exists and is unique for every n. (To simplify 
notations, from now on, we shall omit the dependence on x when there is no 
ambiguity.) 

Clearly, var[^„] is non-decreasing and can be assumed to be strictly positive 
for n large enough. Since E[a„^„] = — 6„, we have 

al = var[a„^„ + hn + Cn] = E[(a„Cn + bn + Cnf] = a-l var[^„] + E[Cl], (7) 

and thus, the following upper bounds hold for n large enough: 

\an\ < Ka \<KbUn, 

dn :=E[C2]i/2<^„, 

for some Ka, Ki, > 0. 

For some threshold u gR, let a be such that 

a > |a„u -I- 6„| > , 

and let e N be such that Vn > N, |a„| < 1. For all n> N, define 

u — br, — a 



(8) 



(9) 



< 



K = 
K = 

Note that h~ <u <ht and that 



1 + a„ 
u-hn + a 



ht - h„ = 



2a 



For all n> N, 



E 



(l5(a:)>M - '^U(x)>uf I £,n{x) 



4- 



U-{1 + an)£,n - K 



W - (1 + an)^n - W. 



'-£,„(x)<u 



l{„(x)>M : 



(10) 



in which denotes the tail of the standard Gaussian distribution function. 
Since 

<h~ ^ U-{1+ an)in - bn > a , 
in>h+ -U+{1 + an)^n + bn > a , 



and an < (Jn, we have 



E 



(l{(x)>« - l£„(x)>u)^ I ^n{x) 



< * — 1 



'-Uix)eM\lhn,ht] '^U{x)e[h- .h+] 



(11) 
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By integrating with respect to the density of , we obtain 



E 



2a 



Co 



1 + a„ 

^,2 



< 



aV^TT ' ' 20-2 ■ 



(12) 
(13) 



where (|T3|) uses a standard Gaussian tail inequahty. 

The upper bound can be tighten by replacing a with a sequence (a„) such 
that 

an ■■= \/2cr„|log(cr„)|^/^ 
which satisfies ^ for n large enough. Therefore, 



E 



(l{(3;)>u - l4„(a;)>u)^ < 0(o"„|log(cr„)|^/^) whcu Cr„ 



(14) 

□ 



Hence, if X is bounded: 



E 



(1^.(^)1 -K(^„)|)= 



= E 



< 



E[(l 



15„(£!;)>«)^]'^A* 



< qk„(.)||oo|iog||a„(.)||oor/' 



(15) 



when n ^ and ||cr„(.)||oo 0. 

Therefore, this simple result shows that the mean square convergence of 
l^ul^n)! to |^u(C)l is related to the mean square convergence of ^„ to ^, hence, 
due to (O, to the regularity of the covariance and the fill-in distance of X. 
Informally speaking, we can say that using an approximation will be more ef- 
ficient than a mere Monte Carlo approach if the regularity of ^ compensates 
for the slowness of filling X, which of course increases as the dimension d of X 
increases. By choosing the a^jS on a lattice, the fill distance can be made such 
that hn = 0{n-^^'^). Then, the convergence of |A„(^„)| to |^k(^)| when the XiS 
fill X regularly, is faster than Monte Carlo if ly > 3d/2. 



3 Convergence acceleration 
3.1 Control of convergence 

Of course, sampling X regularly as above may be suboptimal when the eval- 
uations of / are sequential. This section addresses the problem of choosing 
a sequence (a;„)„gN so that the error of volume approximation conditioned on 
^{xi) = f{xi), i — 1,2, .. . decreases rapidly. More precisely, a desirable strategy 
would consist in choosing 

Xn = argmin T„(x„) := E [(|A„(0| - \Au{^n)\ f \ ^n-i] , (16) 

a;„GX 
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where for all n, Z„ = {^{xi), . . . ,^(a;„)). Note that T„(a;„) can also be written 
as 

T„(x„) = E [E [(|A„(0| - \AuiU\)^ I Z„] I . (17) 

The distribu tion of |A„(^)| conditioned on observations is generally unknown 
(see lAdlerl - EoOol . Section 4.4) and therefore, E [(|Au(0|-|A„(^„)|)2 | Z„] cannot 
be easily determined analytically. To overcome this difficulty, we could minimize 
a Monte Carlo approximation of (fT6|) instead, namely 



Xn — argmin T,; 

x„ex 



i('^n) 



E 



i=l 



Cn)\~\AuiU\? 



{C,*<m} 



(18) 



where the random processes ^ire m independent copies of ^ conditioned on 
Zn — (0, ... ,0). The program l|18p becomes numerically tractable if we also 
replace |Am(OI by its Monte Carlo estimator |A„, (•) !;. Whereas simulating the 
conditioned processes is easy in principle (see Chiles and Delfineil . 1 19991 . 
chap. 7), it is also computationally intensive since it typically requires 0{P) 
operations to simulate ^ at given points xi, . . . ,xi. Since I has to be high enough 
to ensure a degree of accuracy of the estimator (•)!;, conditional simulations 
ought to be avoided. 

An alternative solution is to approximate E [(|Atj(OI ~ |A«(Cn)l)^ I ^n] by 
E[(|A„(0|/ - \A^{U\i? I Zn, < for ; high enough. Then, the 

Minkowski inequality gives 



E[(|A„(0|/-|A„(e„)|/)2 I Z„, {X,,i<l} 



,1/2 



< 



'■e„(x,)> 



„)2 |z„, {X,,i<l}] 



1/2 



(19) 



1=1 



This makes possible to build a stepwise uncertainty reduction algorithm as pre- 
sented in the next section. 



3.2 A stepwise uncertainty reduction algorithm 

Denote by 5 = {yi, ... ,yi} a set oi I independent sample values of X. Given 
a finite sequence (a;i)i<i<„_i of evaluation points, we wish to obtain a new 
point Xn that yields the largest decrease of the upper bound of the volume 
approximation mean-square error obtained in (fT9|) . i.e., 

Xn = argmin T',(a;„) := y E ^ [(l«(a>)>« ~ '^iAy^)>uf I Bn-i ] ^ ^ (20) 

where Bn denotes the event {£,{xi) — /(xi), . . . ,^(x„) = f{xn)}, n > 0. 

A few steps are needed to transform l(20l) into a numerically tractable pro- 
gram. First, note that 

^ [i^^{yi)>w'^^,^{y^)>u)'^ I Bn-i] 

= / ^[ih{y.)>u~'i-i„{y.)>-)^\^i^n)^z,Bn-i] (21) 
JzeR 
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where p^(x)\Bn^i denotes the density of ^(x) conditionally to Bn-i. However, 
intrinsic Kriging assumes that the mean of ^ is unknown and therefore, for 
X € X, E [(Ij^^-j^^— I ^{xn) = z, Bn-i] cannot be determined exactly. 
Indeed, the values of a„(a;),6„(x) and an{x) in ifTOl) are unknown in practice. 
Nevertheless, iJH) leads to the approximation 

E [{l^(x)>u - l?„(a;)>«)^ I £,n{x)] « Vn{x) := * 

Finally, define a discretization operator Ag, which can be written for in- 
stance as 

Q 

VheM., Aqh ^ zi+ ^{z^ - z^^i)l^^^^+rx,^{h) 

i=2 

with zi < Z2 < ■ ■ ■ < zq. We can now write ((20|) as a numerically tractable 
program: 

Xn = argmin T'^{xn) := 

y E ( E P{^Qf (^«) = Bn-i} E [vn{y^) I ?(x„) = z„ B„_i] j . (23) 

An informal interpretation of (|23|) is that a;„ minimizes the error of predic- 
tion of by which is measured via u„(x), averaged on X under 
the distribution /x, and conditioned on the observations. When T"(x) becomes 
small for all x G 5, |An(^„)|; conditioned on observations provides a good ap- 
proximation of Vu- As will be seen in Section IH the proposed strategy is likely 
to achieve very efficient convergences. 



- ^n{x) 
(7n{x) 



(22) 



4 Example 

This section provides a one-dimensional illustration of the proposed algorithm. 
We wish to estimate ([T]), where f{x) is a given function defined over K and 
X ^ fi — J\f{0,a^). We assume that / is a sample path of After a few 
iterations, the unknown function / (as shown in Figure 1) has been sampled so 
that the probability of excursion P{^(a;) > u \ ^{xi) = f{xi),i = 1 . .. ,n} is 
determined accurately in the region where the probability density of X is high. 
This example illustrates the effectiveness of the proposed algorithm. Note that 
in practice, a parametrized covariance has to be chosen for ^ and its parameters 
should be estimated from the data, using, for instance, a maximum likelihood 
approach (e.g. Stein . 1999l l. 



5 Appendix : Intrinsic Random Functions 

In this section, we inten d to summarize t he most important notions about in- 
trinsic random functions I Matheronl . 19731 ). Let A/" be a vector space of functions 



{b^r{x), b £ R'} and £,{x) be a random process with mean m{x) £ J\f. The 
main idea of intrinsic random functions is to find some linear transformations 
of ^{x) filtering out the mean so as to consider a zero-mean process again. 
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Figure 1. Top: threshold u (horizontal soHd Hne), function / (thin hne), n=10 
evaluations as obtained by the proposed algorithm using I = 800 and Q = 20 
(squares), IK approximation /„ (thick line), 95% confidence intervals computed 
from the IK variance (dashed lines). Middle: probability of excursion (solid 
line), probability density of X (dotted line). Bottom: graph of T'fjyi), i = 
I,. . . ,1 = 800, the minimum of which indicates where the next evaluation of / 
should be done (i.e., at approximately 0.75). 
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Let A be the vector space of finite- support measures, i.e. the space of hnear 
combinations J27=i ^i^xi, where stands for the Dirac measure, such that for 
any I? C X, Sx{B) equals one if x € B and zero otherwise. Let A^j_ be the 
subset of the elements of A that vanish on Af. Thus, A £ A^j. implies 

n 

i=l 

In the following, we shall restrict ourselves to the case where A/" is a vector space 
of polynomials of degree at most equal to I. Denote by A/; the linear hull of all 
multivariate monomials x*, where i = are multi-indexes such that 

\i\ := ii + ■ ■ ■ + id < I, and define A; := ^x^-^- 

Let ^g(A) be a hnear map on A;, with values in L^{Q,,A, P), the space of 
second-order random variables. Assume that E[^g(A)] = for all A and that 

k{X,iJ.) := cov[^g(A),^g(a')] = ^ >^ilJ'jk{^i,yj) , 

where k(x,y) is a symmetric conditionally positive definite function (i.e. a 
function such that k{x, y) = k{y, x) and k{X, A) > for all A G A;). Then, Cg(A) 
is a generalized random process and k{x, y) is called a generalized covariance 
(note that any covariance is a generahzed covariance). Let "Hi be the subspace 
of L'^in.A.P) spanned by &(A), A € A;. Since random variables in Hi are 
zero- mean, the inner product of L^{Q,A, P) can be expressed in Hi as 

(^g(A),Cg(/u))l2(q,^,p) = k{X,fi) , A,/x G A( . 

Thus, the bilinear form k(X,fj,) endows A; and Hj\f± with a structure of pre- 
Hilbert space. The completions Hi and A; oiHi and A; under this inner product 
define isomorphic Hilbert spaces. Cg(A) can be extended on A; by continuity. 
Simplifying hypotheses are introduced in the next paragraph. 

Let T/i : A; ^ A be the translation operator such that for A = XiSx^ S A;, 
ThX = XiSxi+h- Note that A/ is stable under translation since Afi is itself 
a translation-stable space of functions. Assume further that the generalized 
covariance k{x,y) is invariant by translation. In the following, we shah write 
k{h) with h = x — y instead of k{x,y), when the covariance is assumed to be 
stationary. Then Th is continuous and can be uniquely extended on A;. 

Definition 1. Let Cg(A) be a zero-mean generalized random process defined 

on hi, with stationary generalized covariance k{h). The random process h 
^g('''?!,A), a G A;, is therefore weakly stationary. S,g{^), X G A;, is then an 
Intrinsic Random Function of order I, or IRF{1) in short. 

If ^{x), a; G X, is a second-order random process, with mean in Afi and 
covariance k{x,y), the linear map 

^: Ai ^ ^ H 

extends ^(x) on A;, where H stands for the Hilbert space generated by ^{x), 
a; G X. Since k{x,y) is positive definite, (A, /i)^ := (^(A), ^(/i))7^ defines an 
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inner product on A/. Let A; be the completion of A/ under this inner product 
and extend ^(A) on A; by continuity (a generaUzed random process is thus 
obtained). 

Definition 2. Let Cg(A) he an IRF{1). A second-order random process ^{x), 
a; S X, is a representation of £,gW iff 

CG(A) = e(A), VAeA,. 

If Co(a^) is any representation of ^g(A), other representations of ^g(A) can be 
written as 

ax)=Ux)+J2B^P^{x), (24) 
i=l 

where the piS form a basis of A// and the BiS are any second-order random 
variables. Thus, the repres entations of an I RF(0 constitute a class of random 
processes with mean in Afi ( Matheron . 1973[ ). 
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